Microslit Nod-shuffle Spectroscopy — a technique for achieving 

very high densities of spectra 
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ABSTRACT 

We describe a new approach to obtaining very high surface densities of optical spectra in 
astronomical observations with extremely accurate subtraction of night sky emission. The ob- 
serving technique requires that the telescope is nodded rapidly between targets and adjacent sky 
positions; object and sky spectra are recorded on adjacent regions of a low-noise CCD through 
charge shuffling. This permits the use of extremely high densities of small slit apertures ('mi- 
croslits') since an extended slit is not required for sky interpolation. The overall multi-object 
advantage of this technique is as large as 2.9 x that of conventional multi-slit observing for an in- 
strument configuration which has an underfilled CCD detector and is always > 1.5 for high target 
densities. The 'nod-shuffle' technique has been practically implemented at the Anglo-Australian 
Telescope as the "LDSS-I--I- project" and achieves sky-subtraction accuracies as good as 0.04%, 
with even better performance possible. This is a factor of ten better than is routinely achieved 
with long-slits. LDSSH — h has been used in various observational modes, which we describe, and 
for a wide variety of astronomical projects. The nod-shuffle approach should be of great benefit 
to most spectroscopic {e.g., long-slit, fiber, integral field) methods and would allow much deeper 
spectroscopy on very large telescopes (10m or greater) than is currently possible. Finally we dis- 
cuss the prospects of using nod-shuffle to pursue extremely long spectroscopic exposures (many 
days) and of mimicking nod-shuffle observations with infrared arrays. 
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1. Introduction 

The problem of subtracting the night sky fore- 
ground emission is a critical one for astronomical 
spectroscopy. The task is particularly acute in the 
red part of the spectrum (600— lOOOnm) as there 
are numerous hydroxyl (OH) bands which domi- 
nate the light giving a bright background. Many 
authors have recognized over the past twenty 
years that low to moderate resolution spectroscopy 
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in this band is ultimately limited by system- 
atic uncertainty associated with sky subtraction 
{e.g.. Dressier 1984). 

In some respects, it is surprising that optical 
astronomy has been slow to recognize an impor- 
tant technique utilized by near-infrared astron- 
omy, i.e., beam-switching. Here, the background 
signal is very strong, is highly variable, and influ- 
ences all observations {e.g., Ramsay, et al., 1992). 
A common implementation of beam-switching is 
where the secondary mirror 'chops' between a tar- 
get object and a sky field while the infrared array 
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is read out continually.^ 

This is perhaps because there is a conflict be- 
tween the desire to beam-switch rapidly, and sam- 
ple the sky contemporaenously, and the desire to 
take long integrations to minimize the effect of 
readout noise. This is especially true for modern, 
very low noise CCD detectors. 

The underlying principle of the nod-shuffle 
technique is simply that a CCD detector can be 
used to store two images of a field, imaged quasi- 
simultaneously (Cuillandre et al. 1994; Bland- 
Hawthorn 1994; Sembach & Tonry 1996). By 
using 'charge-shufHing' charge can be moved from 
an illuminated region to a storage region. This 
process does not invoke readout noise and only 
takes only a fraction of a second since charge can 
be shifted between CCD rows two to three or- 
ders of magnitude faster than it can be read out. 
If this shufHing is synchronized to telescope mo- 
tion two interleaved exposures of object and sky 
can be imaged side by side at the detector. Note 
three important facts: (i) the images are obtained 
through identical optical paths, (ii) the imposed 
flatfield structure is identical for both images, and 
(iii) the CCD is read out only once. 

The use of shuffling techniques in astronomy 
can be traced to early attempts to improve the 
performance of imaging polarimeters (McLean 
et al. 1981; Stockman 1982). Since that time, 
charge-shuffling has been little utilised. Part of the 
reason may stem from experiments by Lemonier 
& Piagct (1983). By rapidly shifting charge back- 
wards and forwards many times (pocket pump- 
ing), they were able to identify local defects in the 
potential profile (trapping sites) within the silicon 
substrate. By the end of the 1980s, traps and 
deferred charge were still a fundamental limita- 
tion to repeated charge shufHe operations (Blouke 
et al. 1988). 

The development of charge- shuffling at the 
Anglo-Australian Observatory dates back to the 
1994 Marseilles conference on imaging spectro- 
graphs (Comte & Marcelin 1995). It was here 
that the first results of integral field spectrographs 
were presented, arguably the most important de- 

^ 'Chopping' refers to a moving secondary mirror while the 
primary remains fixed on the object; we use 'nodding' to in- 
dicate a fixed secondary where the pointing of the primary 
mirror alternates between sky and an object field. 



vclopmcnt in optical instrumentation in the past 
decade. It was clear, and remains true, that the 
fundamental limitation of this powerful technol- 
ogy is the difficulty of accurate sky subtraction 
(Bland-Hawthorn 1995). 

Key developments in CCD manufacture have 
made charge-shufhing a realistic prospect and an 
important consideration in all future instrument 
design. First, the latest generation CCDs (EEV, 
MIT Lincoln Lab) have very low read noise (~ 
le^), negligible dark current, high purity and very 
high charge transfer efficiency (99.9999%). Sec- 
ondly, the manufacturing process prefers to gen- 
erate rectangular arrays^ which provide for stor- 
age regions. Bland-Hawthorn & Barton (1995) 
demonstrate that, with modern CCDs, it takes 
more than a hundred shuffle operations before 
bulk trapping sites start to compromise the data. 

In this paper, we describe the development at 
the AAO of the 'nod-shuffle' method founded on 
the principle of CCD charge shuffling. This differ- 
ential technique has resulted in two important ex- 
perimental breakthroughs. First, the object and 
sky can be measured quasi-simultaneously. As 
we show, the main limit to the accuracy of sky- 
subtraction is the rapidity of nod-shuffling com- 
pared to the temporal power spectrum of sky 
brightness variations. Secondly, nod-shuffle allows 
for a considerable increase in the multi- object gain 
of a spectrograph, up to 2.9 x more objects per 
unit observing time using small 'microslits,' for 
fields with high object densities. We have im- 
plemented the nod-shuffle method with the Low 
Dispersion Survey Spectrograph (LDSS) on the 
Anglo- Australian Telescope (AAT) and have ob- 
tained fractional residuals as low as 4 x 10^^. 

The plan of this paper is as follows: in Sec- 
tion 2 we describe the nod-shuffle concept and dis- 
cuss qualitatively the sky-subtraction and multi- 
plex advantages to be gained. In Section 3 we de- 
scribe in detail our implementation of nod-shuffle 
at the AAT using the Low Dispersion Survey Spec- 
trograph and show some example data. In Sec- 
tion 4, we show the increased multi-object gain 
which becomes possible via the nod-shuffle oper- 

*The photofab process uses a 25 — 30mm reticle which re- 
stricts the 'row' dimension of a CCD. The reticle is stepped 
down the wafer and the new circuit is stitched to the pre- 
vious pattern. 
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ation. Wc quantify the sky-subtraction accuracy 
in Section 5 and discuss ways in which it might 
be improved further. In Section 6, we illustrate 
key observing modes for LDSSH — h which arc fa- 
cilitated by the use of microslits. Finally, we dis- 
cuss future prospects for the nod-shufHe observing 
mode. 

2. The Nod-Shuffle Concept 

The concept behind charge shuffling is that 
unilluminated portions of a CCD can be used for 
storage. The image formed on an illuminated por- 
tion can be 'shuffled' very quickly into a stor- 
age area by clocking before being shuffled back 
at a later stage. For example, with the AAO- 
1 CCD controller and the Thompson 1024x1024 
format CCD, a single row can be shifted upwards 
or downwards in 12.5 fj,s, compared to 30-160 ms 
when clocked through the output amplifiers.^ The 
shift operation is a factor of 4 slower for the Tek 
1024x1024 format and MITLL 2048x4096 format 
CCDs. Since the shuffle operation does not in- 
volve the read-out amplifiers, the primary source 
of noise is now associated with charge transfer 
within the substrate (Janesick & Elliott 1992). 

Each vertical clock shifts the complete image 
on the CCD one row up towards the readout reg- 
ister. The row that was next to the readout reg- 
ister gets clocked in to the readout register and 
cannot be reverse clocked back into the image. At 
the other end of the image, a 'clean' row is gen- 
erated. This happens for shifting in the 'forward' 
direction. Clocking in the reverse direction moves 
the complete image one row away from the read- 
out register for every vertical clock applied to the 
CCD. A clean row is generated next to the read- 
out register and at the other end one image row is 
lost. 

In order to produce two contiguous images side 
by side on the detector via shuffling, the maximum 
field of view {i.e., number of rows) which can be 
shifted without loss of information for the exposed 
or stored image is one third of the detector's col- 
umn dimension. The reason is clear: when the 
detector is clocked in one direction, rows at the de- 

^The AAO-1 controller was upgraded in 1998 resulting in a 
fivefold increase in pixel rate. But this is still three orders of 
magnitude slower than the rate that charge can be shifted 
between rows without reading out. 



tector edge are lost (c.f. Figure 1). More generally, 
shuffling between m partitions uses m/{2m — 1) 
of the CCD for holding the separate observations, 
while the remainder is (a) used for temporary stor- 
age, and (b) rendered useless by the shuffle process 
{i.e., this fraction is never illuminated). A fuller 
technical description of charge-shuffling is given in 
Bland-Hawthorn et al. (2000). 

The nod-shuffle image sequence developed for 
LDSS observing is utilises this underfilled, large- 
shuffle mode and is illustrated in Figure 1. The 
observing sequence is as follows: 

1. The target objects are acquired with the 
telescope on to the spectrograph mask slits 
(these may be true slits or simple apertures 
such as holes). 

2. The shutter is opened for a OBJECT expo- 
sure (usually 10-100 sees in duration), dis- 
persed spectra of OBJECTS-^SKY are ac- 
cumulated in the central area. 

3. The shutter is closed. 

4. The OBJECT image is shuffled up, by clock- 
ing the CCD charge pattern, to a upper stor- 
age area which is unilluminated. 

5. The telescope is moved to a SKY position. 
(This can be a truly blank patch or can sim- 
ply involve moving the objects some way 
along the slits). 

6. The shutter is re-opened and dispersed SKY 
spectra are accumulated, for the same ex- 
posure time as the OBJECT, in the blank 
central area. 

7. The shutter is closed, the charge is shuffled 
back down bring the OBJECT image back in 
to the center and the SKY image into blank 
storage. The telescope is moved back to the 
OBJECT position. 

8. The shutter is opened and more OBJECT 
data is accumulated. 

9. The sequence OBJECT-SKY-OBJECT- 
SKY-... is repeated for the rest of the expo- 
sure. 
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Fig. 1. — Illustration of the nod-shuffle procedure implemented in the LDSS spectrograph showing progressive 
stages of image formation: (a) The spectra of the objects through the slits is imaged onto the central portion 
of an oversized CCD. (b) The first image is shuffled up into a storage region (with the shutter closed), the 
telescope is offset to adjacent sky which is then imaged onto the now empty central region of the detector, 
(c) The object image is shuffled back and additional object photons are imaged (d) Sky is shuffled back and 
imaged. Steps (c) and (d) are cycled continuously until the integration is complete. 
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At the AAT, the OBJECT and SKY exposures 
are typically 30 sees, repeated to fill up a 1800 
sec exposure before readout. Sky subtraction then 
consists of extracting the two regions and calcu- 
lating the difference image. This technique, which 
we call "nod-shuffle," gives extremely precise sky- 
subtraction for the following reasons: 

a The OBJECT and SKY are observed through 

identical slits/apertures. The effect of any 
irregularities cancel out in the subtraction. 

b The OBJECT and SKY are imaged on to the 
exactly the same pixels on the detector. The 
optical path is identical. The pixel response 
is identical. (The response is that of the 
pixel where the image is measured — the 
storage pixels have no effect). 

c The OBJECT and SKY arc observed quasi- 
simultaneously, thus the effect short timescale 
temporal sky variations cancel out in the 
subtraction. This is quantified below in Sec- 
tion 5. 

d The OBJECT and SKY positions can be ex- 
tremely close (a few arcsecs) so spatial sky 
variations are not significant. 

e Because of the identical light path and quasi- 
simultaneity the effects of fringeing on the 
detector from night sky lines cancels out. 

f Similarly the effects of any instrument flexure 
during the course of the exposure cancel out. 

g There is no need to re-sample and interpolate 
the sky for the subtraction, so there are no 
numerical artifacts introduced. 

h The presence of any DC level in the detector 
due to bias, dark current, or scattered light 
does not affect the sky-subtraction. If it is 
constant it cancels, if it varies across the de- 
tector (including the unilluminated regions) 
it will not cancel but will still not affect the 
sky subtraction. 

Of course this is a much more complex observ- 
ing sequence than simply acquiring objects on to 
slits and staring. There is also a penalty for the 
precise sky-subtraction: y/2 more noise in the re- 
sulting spectra because of the subtraction, com- 
pared to a very long slit, though the systematics 



in the sky removal are expected to be greatly im- 
proved. 

However nod-shuffle offers another great advan- 
tage over conventional multislit spectroscopy: it 
permits a large increase in the achievable object 
multiplex. Because a long slit is no longer required 
for sky-subtraction via interpolation the apertures 
only need be large enough to cover the objects. We 
term these "microslits" . Additionally they need 
not be slits — they can be apertures of any shape 
such as circles. If we take the example of observ- 
ing faint 24*^ magnitude galaxies only a 1 arcsec 
aperture is required due to their small size (Small 
et al. 1995). Comparing this to typical multislit 
observations with 10-15 arcsec long slits (Glaze- 
brook et al. 1995), we can see that we would ex- 
pect 10-15 X as many slits to be squeezed on to 




Fig. 2. — Illustration of the nod-shuffle geometry 
in the case in which the detectors are overfilled by 
the instrument field of view (FOV) (in this case 
two detectors are shown). Unilluminated regions 
must be taken from the active FOV giving a 50% 
overhead resulting in a stripe pattern. Note the 
stripe width can be as small as individual spectra, 
however it is desirable to make them larger to min- 
imise area lost to edge effects at the strip bound- 
aries: a region of wide ~ instrument PSF will be 
badly subtracted. A reasonable width would be 
large banks 20-50 spectra. 
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the mask without spectral overlap. We quantify 
these multiplex gains below in Section 4. 

Finally we note that for multi-object spec- 
troscopy there is an alternative mode of observing 
where the charge is shufSed only a few pixels. Be- 
cause a slit mask blocks out light any part of the 
CCD can be used for storage. This is particularly 
useful because it scales to multiple, mosaicked 
CCDs i.e. when the camera FOV is much bigger 
than the detectors. This case is illustrated in Fig- 
ure 2. A penalty here is that half the available 
detector area must be used for storage when it 
could be used on-sky, however as wo demonstrate 
below it still gives a formal multiplex advantage 
in the high source density limit. 

3. The AAT/LDSS-I-+ Implementation 

The practical implementation we will describe 
was developed using the AAT's Low Dispersion 
Survey Spectrograph (LDSS), which came to be 
known as the LDSS++ project. LDSS is a wide- 
field multislit spectrograph with a 12 arcmin field 
of view. A large collimator re- images the telescope 
pupil, in this space can be inserted grisms and/or 
filters, this is then imaged through a camera onto 
a CCD detector (Wynne & Worswick 1988; Glaze- 
brook 2000). The grism can be taken out for di- 
rect imaging of the field or the mask, this is used 
to acquire the field on to the mask accurately. 

LDSS has recently been equipped with a 
volume-phase holographic grating (VPH; Bar- 
den, Arns & Colburn 1998) and a MITLL deep- 
depletion 2048 X 4096 CCD detector with 15/im 
pixels. These two upgrades give a considerable 
improvement in the red 500-lOOOnm throughput 
of the system: the gain at 700nm is a factor of 2 
(Glazebrook 1998). 

The LDSS field of view is circular and is ~ 2000 
pixels on the detector (0.39 arcsec pix"-^ scale). 
The shuffle direction is along the long axis of the 
CCD, perpendicular to the dispersion direction 
exactly as shown in Figure 1. This is not abso- 
lutely necessary but is done because it is easier to 
block the adjacent storage areas spatially by using 
the mask; otherwise some sort of spectral blocker 
would be required and this would not be ideal due 
to offsets between slits. In nod-shufHe mode we 
thus use the central 2048 x 1365 pixels. It repre- 
sents approximately the underfilled case described 



in Section 2. 

The implementation of our nod-shuffie scheme 
is as follows. At the start of a nod-shuffie run, a 
shuffie sequence is downloaded to the CCD con- 
troller micro and the instrument sequencer micro 
from the VAX computer; the instrument sequencer 
also receives a telescope command set. The VAX 
then tells the instrument sequencer and the CCD 
controller to 'run'. The controller runs software 
which interprets the shuffle sequence, clocking the 
charge up and down and driving the CCD shutter. 
It dictates each step by triggering an event with 
an 'external sync' pulse for each phase of the op- 
eration. The triggers occur after fixed time inter- 
vals since there is presently no handshake from the 
telescope. The number and nature of the triggers 
depend on whether there is to be guiding at either 
the object or sky position (OFFSET mode), at nei- 
ther (OFFSET NO GUIDE mode) or both (AXES 
mode). With the output pulse, the CCD controller 
toggles the status of an I/O line and waits for a 
given delay time. The instrument sequencer reads 
the I/O line and, when required, writes telescope 
control commands to a port on the VAX/ VMS 
computer system. A program running on the VAX 
reads these commands, translates them and routes 
them via the CAMAC interface to the telescope 
control Interdata computer. 

There is no feedback in this system: the CCD 
controller does not know the state of the tele- 
scope. Ideally of course it would, but this would 
require complete re-engineering of the whole ob- 
serving system. Instead the telescope movement 
is allowed for by predetermined time delays. The 
controller waits a given amount of time between 
shuffles with the shutter closed to allow the tele- 
scope to finish its 'offset and stabilise' action. For 
small offsets of a few arcsec, the AAT docs this in 
about 1 second; typically we allow 2 sees dead time 
in a 30 sec integration time. It was verified that 
this was adequate by taking long-exposure direct 
images of star fields in nod-shuffle mode and look- 
ing for image elongation along the offset direction. 
The two shuffled images can also be subtracted to 
look for elongated residuals — none were found 
down to the noise level. 

Some sample data of nod-shuffle spectra are 
shown in Figure 3. This was taken for a redshift 
campaign in the Hubble Deep Field South (Glaze- 
brook et al. 2000a) during commissioning of the 
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Fig. 3. — Sample data from the HDF-S observing campaign. Panel (a) shows the slitmask used (225 
microslits) and panel (b) shows the raw shuffled data, (c) shows the difference image zoom,ed in. The slits 
in the case are circular apertures, so the spectra appears as tramlines a few pixels wide horizontally across 
the detector. Panel (d) shows two sample extracted spectra of a bright and faint galaxy, the solid lines are 
the spectra (unfluxed) and the lower dotted lines show the theorectically achievable noise level as determined 
by shot and read noise (shot dominates). Bad columns are masked out of the plot. Sky residuals are only 
seen near extremely bright lines (5577A and 6300A are marked as examples) and are entirely consistent with 
pure Poisson variance. 



7 



nod-shufHc system. Wc placed 225 microslits (cir- 
cular ~ 1 arcsec apertures) on targets along the 
1365 pixel spatial axis (= 9 arcmin), the spectra 
are dispersed along the horizontal 2048 pixel axis 
(= 5300A). The LOSS PSF is a Gaussian with 2 
pixels FWHM at the field field degrading to 3 pix- 
els at the field edge. The microslits are spaced 
at intervals of at least 4 pixels vertically (subject 
to target availability) so their spectra are signif- 
icantly separated. The horizontal spread of the 
slits was up to 3 arcmin so as not to introduce 
significant wavelength offsets between spectra. 

It can be seen in Figure 3 that the form of 
this data is somewhat akin to spectra from fiber 
optic spectrographs in that each object produces 
a tramline which is trac;ed and extracted. How- 
ever in this case the extraction is done after sky- 
subtraction and there are significant wavelength 
offsets between spectra. 

4. Multiplex gains 

To quantify the multiplex gain we must com- 
pare the number of spectra observable per unit 
time to the same limiting signal/noise ratio ver- 
sus the longslit case where the sky is subtracted 
by interpolation. We must observe for longer with 
nod-shufRing to reach the same signal/noise, how- 
ever this is more than balanced by an increase in 
the number of slits we can fit on the mask. We 
call this the 'nod-shuffie advantage' (NSA). 

The OBJECT- SKY subtraction in the nod- 
shuffle case introduces \/2 extra subtraction noise. 
First we consider at what length the longslit sub- 
traction introduces the same amount. We will as- 
sume the longslit has length n elements, where an 
element is taken as the spatial extent of the target 
objects (thus n = 1 for the microslit). 

Conventionally the background along the slit, 
excluding the object, is fitted with either a lin- 
ear model or a higher-order polynomial. Typically 
the background level will vary by a few percent 
across the slit due to instrumental effects such as 
slit alignment and optical distortion and this slope 
will vary with wavelength due to the structure in 
the sky spectrum. The fitting will also be limited 
by the presence of slit irregularities. This is dis- 
cussed in more detail below in Section 5. For now 
we will compute the ideal limit for a smooth slit. 

Accurate sky-subtraction in the neighbourhood 



of the bright sky emission lines requires fitting at 
least a general linear model to each wavelength 
channel, thus the error at the object location is 
the error on the intercept on the slope (cr^) from 
the line fitting from N points: 



where a is the noise on each point and for sim- 
plicity we have ignored the omission of the central 
object point. We now consider the following ques- 
tion: as the slit length n increases at what point 
does subtracting the linear fit introduce less noise 
than nod-shuffiing, i.e. + a"^ < 2cr^? 

This occurs at n = 6, after allowing for more 
complex formulae where the central point is omit- 
ted. Instead of the slit we could in principle sub- 
stitute 6 microslits. There is a factor of two nod- 
shufl[ling overhead either temporally (due to the 
sky position) or spatially (if we move the object 
between adjacent slit positions we have 3 pairs 
rather than 6 objects). For n = 6 we calculate 

= 0.95cr^ and so the NSA is calculated to be 
2.9. As slits become longer the NSA increases fur- 
ther and tends to n/4 for large n. While we can 
fit on nx more slits we have to observe 4x longer 
to allow for the two positions overhead and sub- 
traction noise. 

In practice, as the slit becomes longer more in- 
strumental effects come into play and a linear fit 
no longer improves the residuals. Often a higher- 
order polynomial is used to allow for curvature, 
however this will introduce yet more noise as there 
are more free parameters. In practice a slit length 
of 15-20 arcsec is the useful limit, if slits are this 
large the NSA is 4-5. 

For the overfilled case illustrated in Figure 2 
there is another additional factor of two for charge 
storage in regions which could otherwise be used 
for observations; nevertheless the NSA is still 1.5 
exceeding the longslit case and providing better 
sky-subtraction. 

Of course the theoretical NSA is only achieved 
if the object density is high enough to allow close 
spacing of microslits. In the very low density 
regime where a very long slit can be placed on 
each object with no concomittant multiplex loss 
the NSA is only 0.5, i.e. we must observe twice 
as long to balance the \/2 subtraction noise. In 
practice however for faint spectroscopy typical slit 
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spectroscopy is dominated by residual systematics 
at the 0.5-1% level (see Section 5) and not random 
noise where the lines are bright. And at low reso- 
lutions {R < 2000) a large fraction (- 50%) of the 
red spectrum is occluded by bright lines, so the 
supposed S/N loss is moot. 

One common technique to reduce these sky 
residuals in otherwise conventional longslit observ- 
ing is to use a 'slow' beam-switching technique to 
improve the systematic residuals when observing 
ultra-faint targets by moving the object along the 
slit in consecutive observations. This is analogous 
to nod-shuffle except the CCD is read out between 
the two positions. The individual exposures must 
be at least 5-10 minutes (on a 4m telescope) to ob- 
tain enough sky signal to be background limited 
and consequently when the images are subtracted 
there is a residual due to temporal sky changes. 
This residual is removed again by fitting along the 
slit, but the systematics are reduced because of 
the lower overall level. Like nod-shuffle this will 
always introduce more subtraction noise. The 
minimum NSA versus this case is now 5.9 (under- 
filled) and 2.9 (overfilled). 

So far we have made the assumption that an 
independent linear fit must be done for each wave- 
length. However if the sky background has no 
structure, i.e. is observed in a wavelength region of 
featureless continuum, then we would expect the 
slope across the slit to vary only slowly with wave- 
length and the fitting can in principle be highly 
constrained. The underfilled NSA reaches 0.5 in 
this limit. However even in the blue part of the 
optical spectrum (350-500nm) there is still con- 
siderable stucture in the night sky spectrum due 
to scattered solar absorption lines. 

Finally the NSA is maximised at very high tar- 
get densities. The required density is approxi- 
mately: 

3600/? , . . _2 

p = — obiects arcmm 

^ Wax 

where /3 is the dispersion in A/pixel, a is the spa- 
tial scale in arcsec/pixel, x is the microslit size 
in arcsec and W (in A) is either the wavelength 
range on the detector (when the spectra are short 
compared to the detector size) or the minimum 
wavelength overlap required for all objects by the 
mask design (when the spectra are comparable 
to or longer than the detector). For LDSS-I— 1-, 



a = 0.39 arcsec pix~^, /3 = 2.6 A pix~^, for the 
HDF-S project we used W = 3000 Aand 1.0 arcsec 
apertures. This gives a sky density requirement of 
~ 8 objects arcmin"^. For field galaxies this den- 
sity is achieved at i? « 23 (Hogg et al. 1997; Small 
et al. 1995). It is also very suitable for observing 
stellar and galaxy clusters. It is a much higher 
density than can be achieved by conventional mul- 
tislits (~ 5-10 X ) and by fiber spectrographs — for 
example the highly multiplexed 2dF spectrograph 
can only reach 0.05-0.1 objects arcmin"^ (Lewis 
et al. 2000). 

5. Sky subtraction accuracy 

5.1. Achievable accuracy with conven- 
tional multi-slits 

In order for the figures for nod-shuffle accuracy 
to be meaningful, it is useful to consider how well 
sky can be subtracted using a longslit. This is lim- 
ited by instrumental imperfections such as variable 
PSF, slit and CCD irregularities, slit tilt and pixel 
sampling effects, image distortion, fringeing, flex- 
ure etc. The eff'ect of slit tilt, with respect to the 
CCD columns, is particularly intresting as it is this 
which causes linear sky variations across the slit. 
If we consider the tilt as an angle 9 then we expect 
fractional sky variations along the slit: 

AS Ids 

where L is the distance along the slit in pixels and 
dS/dx is the rate of change of the sky count S 
with pixel x in the spectrum. The instrument is 
usually critically sampled so the PSF is 2-3 pixels. 
This means we expect fractional sky fluctuations 
of order unity between spectrally adjacent pixels 
in regions near bright sky lines. This gives: 

'S^ ^ 

In the LDSS case the achievable rectilinear 
alignment is 1 pixel in 1000 giving 9 = 0.05°. In 
our experience this is typical of modern spectro- 
graphs as mechanical tolerances are usually de- 
signed so that alignment is possible to ~ a CCD 
pixel. Image distortion in the optics also turns out 
to be a big effect. LDSS is a typical fast //2 cam- 
era. The change in radial distortion across a slit 
length will introduce an apparent rotation {6') if 
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the slit is off the cardinal axes. A useful formula 
for this is: 

^, ^ dDxy 
dr 

for a slit at {x, y) wrt the optical axis axis (ra- 
dius r = + y^) where the radial distortion 
D = r2 — ri. 

In the LDSS optics the typical distortion 
dD/dr ~ 0.02, thus we can estimate typical ap- 
parent rotations (using a; ~ y ~ r) of ^' ^ 2°. 
Many similar systems have fast cameras (e.g. 
the LRIS Keck multislit spectrograph camera is 
//1. 56 and using the LRIS astrometry software 
we find distortions of ~ 10 pixels over 400 pix- 
els, so dD/dr ~ 0.02 ) so we expect this order 
of radial distortion to be typical of modern fast 
spectrographs. 

Putting these formulas together this rotation 
would cause a linear sky gradient of order 30% 
across a 10 arcsec slit. If the data could be resam- 
pled to sub-pixel accuracy to correct for tilts, we 
could expect to achieve 0.1 pixel accuracy which 
would still leave 10% variations. 

In principle though smooth variations can be 
removed. However another effect is slit irregular- 
ities. The milled metal slit masks used in LDSS 
have 10-20 /um irregularities (1 arcsec = 150 /xm 
at AAT's //8). This is typical of machine cut 
masks (Szcto ct al., 1996). Thus we also expect 
~ 10% semi-random variations along the slit due 
to this effect. This can be flatfielded out by di- 
viding by a dispersed white light exposure, this 
will be limited by flexure between the white light 
and the data exposure. LDSS flexes at about 0.5 
pixels/hour thus we can expect a misalignment of 
order 0.1-0.2 pixels giving residuals of order 1%. 

So we are in a situation in LDSS where we are 
fitting slopes of order 10 30% with a slit length 
of 10-20 pixels and with systematic variations of 
±1%. The sky lines in LDSS at low-resolution 
have peak counts of ^ 2000 electrons in a half hour 
exposure, so the random noise will be about 2%. 
Fitting along the slit would reduce this to < 1% 
at which point it is comparable to the systematic 
slit irregularities. 

How faint can we go with 1% sky-subtraction 
accuracy? In the /-band the sky background is 
dominated by the lines, if we demand an object 



has S/iV ~ 3 then the faintest that can be reliably 
reached, in any exposure time, is Iab = 23.6 per 
arcsec^. Fainter than that the fluctuations in the 
spectrum will be dominated by sky residuals at the 
lines, and for low-resolution /-band spectroscopy 
the lines occlude most of the spectrum. 

How could this be improved? One crucial area 
with scope for improvement is the microroughness 
of the slit edges. 

5.2. Improving multi-slit accuracy 

Conventional laser cutting (melting and va- 
porization) of metal {e.g., Al) masks produces 
10— 20/im roughness. During manufacture, most 
metals undergo warping during cutting which de- 
focusses the laser. This is one of the major sources 
of error in slit manufacture which in turn con- 
tributes to poor sky subtraction. 

Recently, new slit masks made with laser-cut 
carbon fiber have already achieved an order of 
magnitude improvement in edge roughness (Szcto 
et al. 1996). An important step by the Gem- 
ini/GMOS team (Stilburn, private communica- 
tion) was to use epoxy-bonded sheets made of 3- 
ply unidirectional carbon fiber with a total thick- 
ness of only 200/zm. The center ply is orthogonal 
to the outer plies, and the slits are cut at 45° to 
the fiber direction. The low-power Nd:YAG laser 
cuts slits at 10 mm s~^ and, remarkably, achieves 
a 1— 2/im edge roughness. 

Let us assume an 8m size telescope with a larger 
image scale. At //16 a 1 arcsec slit would be 600 
/im so the irregularities would be 0.1 0.2%. The 
larger mirror will accumulate more light, so we 
would reach this limit in a 3 hour exposure, faster 
if our spectrograph was more efficient. At 0.1% of 
sky we are now observing at a surface brightness 
limit of Iab = 26.1 per arcsec^ with forseeable 
multi-slit technology. Improving the instrumental 
resolution will reduce the amount of spectrum oc- 
cluded by sky lines, though the peak counts in the 
lines will stay approximately the same as they will 
stay unresolved. There will be a danger of running 
into detector dark and readout noise limits. 

5.3. Nod-shuffle sky-subtraction accuracy 

It is clear that the achcivablc accuracy of sky- 
subtraction with the nod-shuffle technique de- 
pends on how rapidly the nod-shuflling is done. 
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If this is done at a fast rate changes in the night- 
sky background are sampled more accurately, as 
well as changes in the instrument such as flexure. 
However characteristic timcscalcs for the latter arc 
of the order of hours, so sky temporal variations 
will be the limiting factor on the residuals. 

In order to empirically measure the accuracy of 
sky-subtraction we used a sequence of 8 longslit 
spectra, collected on 2-3 April 2000 at the AAT 
in longslit mode. The targets were faint QSOs 
(/ < 22) in a scheduled AAT science project, by 
arrangement with the observers the observations 
were done so as to allow us to try out different nod- 
shuffle times. The slit wss 4 arcmin long and the 
longslit data were collected in nod-shuflle mode 
with the targets nodded 5-10 arcseconds along the 
slit. The log of the observations is given in Table 1 . 
A sample raw data frame is shown in Figure 4. 

All the frames had the same total exposure time 
of 1800s, the only change was the rate of nod- 
shuffling which we varied from as fast as 15s to as 
slow as 450s. Once the QSOs are masked out the 
sky region of the 2D images can be used to quan- 
tify the effect of the nod-shuffle time on sky resid- 
uals. The data processing sequence is extremely 
simple: 

1. Frames are bias- level subtracted. 

2. A median-filter smoothed version of each 

frame is made. The smoothing is entirely 
along the spatial (Y) axis with a smooth- 
ing kernel of 21 pixels (8.2 arcsecs). Be- 
cause the slit is very closely aligned with the 
CCD columns (< 1 pixel) and the CCD has 
good flat-field characteristics this essentially 
replaces each pixel with a smoothed estimate 
robust against cosmic-rays. 

3. The smoothed frame is used to calculate the 
variance map of the raw frame assuming shot 
noise from the sky and the know readout 
noise of the detector. 

4. Cosmic rays are identified as > lOcr peaks 
in the RAW-SMOOTHED map and used 
to calculate an exclusion mask. Any pixel 
within 5 pixels of a cosmic ray peak are 
masked. Cosmic ray identifications are 
checked visually. This mask excludes about 
1% of all pixels on each frame. 



5. The cosmic ray mask is ORed with another 
mask which excludes several bad columns 
and the centre rows where the QSO spectra 
he. 

6. A sky spectrum is formed for each frame by 
averaging unmasked pixels along the slit. A 
variance spectrum is also calculated. 

7. A residual sky spectrum is formed for each 
frame by repeating step 6 for the residual 
A— B frame. 

To calculate the fractional sky-residual Asky/sky 
we can integrate the residual and sky spectra in 
wavelength and divide. Absolute flux calibration 
is not necessary. We chose two wavelength regions: 
the flrst region encompasses the two main OH re- 
gions in the 7-band (7200-8880A) and the second 
region encompasses the 5577A 01 line (60A width 
bandpass). We choose to fit and remove the con- 
tinuum level from the spectrum before doing the 
summation. This is because there is not enough 
unilluminated space on the detector to allow accu- 
rate determination and extrapolation of the level 
of scattered light. In any case the integrated sky- 
brightness is dominated by the lines, not the con- 
tinuum, and it is the temporal variation of the line 
flux we are primarily concerned with. Since our 
sky-spectrum is also integrated along 4 arcmins of 
slit we can go very deep in measuring systematic 
residuals. 

Our results are shown in several flgures. Firstly 
Figure 5 shows raw and residual spectra for our 
two regions for different nod-shuffle times. Fig- 
ure 6 shows Asky/sky plotted against nod-shuffle 
time for the two regions. There is a clear trend 
of systcmatics consistent with scatter around zero 
at the level of ±10"'^ for small nod-shuffle times 
(<100s), the level of the scatter is about 3a. For 
large nod-shuffle times >100s there are gross sys- 
tematic residuals at the ±10~^ level. 

One limitation of our particular nod-shuflle 
technique is we observe an asymmetric seqeuence: 

ABAB...ABAB 

If there was a systematic change in sky-brightness 
during the course of the observations we would ex- 
pect to see a residual because the average B frame 
is slightly later in time than the average A frame. 
A systematic decrease in OH emission during the 
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Fig. 4. — Example of the longslit data which goes into our sky-residual analysis, (a) Raw, shuffled image 
(except for cosmic rays being patched out), (b) A-B subtracted image showing the 2D residuals. Residuals 
are integrated along the slit and across a wavelength range as described in the text. 
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02apr0010 NS time = 7.5 sees 




iOO 7600 7700 7B00 

Wavelength / A 

03apr0012 NS time = 30 sees 




7300 7400 7500 7600 7700 7B00 

Wavelength / A 

05apr0007 NS time = 300 sees 



7900 8000 






7600 7700 
Wavelength / A 



8000 



Fig. 5. — Sky residuals in a 1800s exposure as a function of nod-shuffle time. The upper hue in each panel 
show the raw sky/10, the lower points (with error bars) show the residual after nod-shuffle subtraction. A 
clear point-point systematic is seen in the 300s nod-shufHe exposure, while the others are consistent with 
zero. 
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Table 1: Log of observations for the nod-shuffle sky residual analysis, all 1800s total integration time 



AAT RUN NS-Time/secs UT start Remarks 



02APR0001 


30 


09 


46 


54 


Some cloud (5/8*"^^) 


02APR0008 


15 


13 


06 


03 




02APR0010 


7.5 


14 


00 


32 


Clear 


02APR0012 


30 


15 


18 


48 




03APR0004 


60 


14 


17 


06 


Clear, bad seeing (5-10") 


03APR0005 


60 


14 


50 


51 




03APR0007 


30 


15 


40 


35 


/ / 


03APR0008 


30 


16 


15 


11 




05APR0007 


300 


14 


29 


03 


Clear, v. bright O2 emission (8645A) 


06APR0004 


150 


09 


25 


45 


Clear, seeing 2-2.5" 


06APR0005 


450 


09 


58 


54 





course of the night is often observed (Leinert et al. 
1998). This effect is normally explained as the re- 
sult of energy stored during the day in the respec- 
tive atmospheric layers (Kondratyev 1969). We 
see evidence for exactly this effect, with the correct 
sign, in our data (Figure 7). An additional source 
of long-term variation is the effect of changing air- 
mass during an extended observing sequence on a 
single source (Bland-Hawthorn ct al. 1998). In 
principle it is straight-forward to reduce these ef- 
fects by improving the nod-shuffle method with a 
symmetric mode, i.e.: 

— ABA.. .ABA— 

2 2 

Then the A — B subtraction would cancel out any 
linear trend. However we have yet to try this in 
our AAT implementation. 

The effect of drift should also cancel to some ex- 
tent for long all-night nod-shuffle exposures which 
bracket local midnight. It would be desirable to 
take much longer integrations with a fast nod- 
shuffle rate to explore the limits of this technique. 
While we do not have this data as such, what 
we can do is stack all our data where the nod- 
shuffle time is <100s. This gives us a 5.5 hour 
very deep exposure, albeit with a variable nod- 
shuffle time. The residual point from the 5.5 hour 
stack is (4.0 ±1.2) x lO""* — a 3a detection. It 
is important to realise that this is an impressively 
small residual corresponding to a Iab = 28.3 mags 
arcsec"^ source. This level of accuracy is a factor 
of 10-20 X better than is typically achieved with 
slits (see Section 5.1). 



We also emphasize that this is a lower limit 
to what could be achieved with faster nod-shuffle 
times. One could nod-shuffle faster (e.g. 10s) for 
a whole very long exposure. Also one should im- 
plment the symmetric mode to cancel long-term 
sky-brightness drifts. Finally for the ultimate sky- 
subtraction limits one could combine nod-shuffle 
with slits to allow for 2D interpolation and removal 
of any local residuals after nod-shuffle subtraction. 
Accuracies of 10~^ or better should be achievable. 



I OH band 



5577A line 



Nod— shuffle time/seconds 

Fig. 6. — Sky residuals vs nod-shuffle time for the 

chosen OH band and the 5577A line. The rectan- 
gles indicate the range of the +/- la errors verti- 
cally and are filled for OH, open for 5577A. Small 
artificial offsets temporal are used at 30s and 60s 
to show the multiple points with clarity. 
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5.4. Comparison of residuals to theorecti- 

cal predictions 

We have shown that the nod-shuffle residuals 
appear to be characteristically smaller for nod 
steps below 100 sec compared to longer sample ex- 
posures. We now examine this with a simulation 
of the nod-shuffle technique using a model which 
attempts to describe the time-variable behaviour 
of OH emission. 

Suitable observations for deriving the temporal 
power spectrum of OH are hard to come by. Line 
strength variations on timescales of 5-10 mins are 
given by Bland-Hawthorn et al. (1998) for optical 
lines and Ramsay et al. (1992) for near-infrared 
lines. The latter reference shows the OH behavior 
to be approximately sinusoidal on timescales of 
an hour with a peak-to-peak amplitude of about 
10%. On longer timescales, the OH variation is 
more erratic. 

Our model for atmospheric variability uses a 
finite set of sinusoidal modes with periods 16, 23, 
26, 29, 38, 51 and 101 mins. The amplitude of the 
variations are inversely related to the period such 
that the 16 min dominates, in rough accordance 
with the wave-like structures observed by Ramsay 
et al. (1992). The peak-to-peak amplitude is 15% 
of the mean line strength. For each mode, there is 
a 5% dispersion in the period and amplitude, each 



with random phases. Our predicted behavior is in 
good agreement with the above references. 

However, high cadence observations show clear 
evidence for stochastic behavior on shorter obser- 
vational timescales. Here, we found data from 
the 2MASS Wide-field Airglow Experiment^ to 
be the most useful (Adams & Skrutskie 1997). 
The H band observations have an order of magni- 
tude finer sampling than in Ramsay et al. (1992). 
We simulate this by including a component of 
1// noise within our model (cf. Barnes & Al- 
lan 1966). To generate the 1// component we 
use gaussian white noise scaled to 5% (Ic) of the 
mean line strength (see Adams & Skrutskie 1997, 
Fig. 2) convolved with Green's impulse function 
Jit - to) = c{t - to)-'^-^ (t > to): I{t -to) =Q 
{t < to) ■ For convenience, we set c = 1 and sam- 
ple the time axis in units of seconds. An example 
time series is shown in Figure 8. 

In Figure 9, we have attempted to simulate nod- 
shuffle sampling of our model atmosphere. The to- 
tal exposure time is 1800 sees and the time series 
is sampled at all possible time steps (longer than 



Sec : htt p : // pegasus.phast . umass . edu /2mass /teaminfo / 
airglow.html 
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Fig. 7. — OH sky residuals vs UT for nod-shuffle 

times <100s. Local midnight (14'^UT) and the 
end/start of twilight are indicated by dotted lines. 
We expect to see a negative residual at the start of 
the night and the residuals should have a positive 
slope with time, we do in fact see this. 




Fig. 8. — Example OH airglow time series gener- 
ated from our model. The shaded bands indicate 
periods of 400 sees. 
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or equal to 10 sec) that lead to an integer number 
of cycles. For each nod exposure, the simulation 
was run 10 times. The mean residuals (and la er- 
rors) arc shown as a function of the nod exposure. 
There is a evidence for a change in character on ei- 
ther side of about 2 min time steps. The residuals 
with 2 min samples or longer arc 10"'^ or larger; 
the residuals from faster sampling are 1 — 4 x 10~^. 

Repeated runs of our model atmosphere show 
that this changeover can be as short as 1 min. 
There are also times when short sample time steps 
lead to big residuals (e.g. 20 sec) and when long 
time steps lead to residuals smaller than 10~^. 
These are times when the nod-shufSe sequence 
happens to fall in or out of step with a beat- 
ing atmosphere. Airglow is clearly a complicated 
phenomenon: empirically it is clear that the nod- 
shufHe time should be < 30 sec. The total num- 
ber of shuffles should not greatly exceed « 10^ 
per readout if one is to avoid significant degra- 
dation from trapping sites within the silicon sub- 
strate (Bland-Hawthorn & Barton 1995). Given 
the periodic nature of the airglow oscillations it is 
possible that an optimal shufHe sequence ought to 
have variable time sampling to avoid beating. 

5.5. Object-sky balance 

The question arises what is the optimum bal- 
ance between OBJECT and SKY time in a nod- 
shufHe sequence? This especially important when 
we are nodding out of a microslit and the SKY 
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10 100 

Nod— shuffle time/seconds 



Fig. 9. — Simulation of the expected residuals in 
a 1800s exposure for different nod-shuffle times. 
This should be compared with Figure 6. 



frame is not collecting any object photons. Per- 
haps one should cut down on the relative fre- 
quency of SKY frames? It turns out the optimum 
balance is in fact 50:50, i.e. symmetrical. Consider 
an exposure of total time T where a fraction x is 
spent on OBJECT and (1 -x) on SKY. Let O and 
S be the object and sky flux (photons/pixcl/sec). 
We will neglect readout noise which is equivalent 
to assuming that T is long enough that both O 
and S arc large enough that their shot noise dom- 
inates over the readout noise, which is optimum. 
We will also assume that the object is much fainter 
than the sky, i.e. O « S. 

We form the residual sky-subtracted image as: 



OBJECT - 



{1-x) 



SKY 



Then the signal to noise in the residual image 



is: 



S/N = 



xOT 



^^^^^/x{l-x) 



x{l — x) has a maximum when x = 0.5, i.e. 
equal times on OBJECT and SKY. x could be re- 
duced in a scheme where the SKY frames were av- 
eraged over mulitple observations or multiple slits 
before subtracting, however one then loses the cru- 
cial ability of the simple nod-shuffle scheme to fol- 
low precisely short-term and long-term temporal 
variations in the sky and eliminate local effects 
such as flat-fielding, fringeing, flexure, slit rough- 
ness, etc., from the sky subtraction. 

Finally we note that, not surprisingly, in the 
case O » S, i.e. the object is much brighter than 
the sky, the maximum S/N is obtained when as 
much time as possible on the OBJECT. However 
in this regime the sky contribution to the statis- 
tical noise is negligible so nod-shufHe is not very 
useful, except possible in a observation where sys- 
tematic effects were an important concern (for ex- 
ample velocity dispersion measurements of bright 
galaxies as discussed in Sembach & Tonry, 1996). 
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5.6. Effect of random objects on sky- 

subraction 

We conclude our section on sky-subtraction by 
considering the effect of random interloping ob- 
jects on the accuracy. In our simple AAT imple- 
mentation we nod between two positions, so there 
is some chance there will be an interloper in the 
sky position. 

We can estimate this effect using deep galaxy 
number-magnitude counts (Hogg et al. 1997). At 
our HDF-S limit of i? = 24 there are ~ 60000 
galaxies deg~^ which equates to a 1 in 200 chance 
of a 1 arcsec"^ aperture encountering one. This 
is consistent with our HDF-S observations where 
two negative spectra were observed. 

This can be alleviated by dithering the sky po- 
sition. This can be done in two ways. Firstly 
seperate nod-shuffle exposures can have different 
sky positions. Then the frames can be combined 
with outlier clipping after pair-subtraction to ef- 
fectively remove the interloping spectrum with 
negligible effect on signal/noise (as only a tiny 
fraction of pixels are rejected). 

Secondly a more technically sophisticated ap- 
proach would be to drive to a different sky posi- 
tion on each shuffle. This would be advantageous 
for short shuffle runs where there arc not many 
individual exposures. A disadvantage is that the 
effective average sky is not outlier clipped, how- 
ever the flux of interlopers is still greatly reduced. 
We note this mode is not possible with our AAT 
system, but is in principle straight-forward to im- 
plement. 

In view of the remarks in Section 7.3 about 
30m telescopes it is useful to consider the ulti- 
mate achievable limits. For very faint galaxies 
it would be sensible to use smaller slits, because 
the faintest observed objects in the Hubble Deep 
Field typically have half-light radii of only 0.1- 
0.2 arcsecs (Gardner and Satyapal, 2000). At this 
limit [Iab ^ 30) there are of order ~ 10^ galaxies 
deg~^, so the covering factor at 3 half-light radii is 
still only 10%. Thus the sky-subtraction problem 
is still tractable with dithering. 

Finally we note even with an interloper the sky- 
subtraction itself is still accurate. This contrasts 
with the longslit case where the interloper can dis- 
turb the interpolation. The result is the sum of 
the positive and negative spectrum, if the relative 



brightnesses arc similar and the signal/noise is suf- 
ficient in principle redshifts can be derived for both 
objects. 

6. Sample observing modes 

A discussion of the different modes of observing 
whch have been tried with LDSS-I— I- is useful to 
show the potential new capabilities. 

The most conventional mode is multi-object 
spectroscopy with wide wavelength range. Sam- 
ple raw data was shown in Section 3. 

We would like to illustrate briefly two other 
modes which have been used recently to achieve 
very high multiplex levels of 1000-2000 objects per 
LDSS mode. 

It is well known that use of a blocking filter 
to limit the wavelength range of a spectra allows 
many more slits to be used on a mask without 
spectral overlap. When this technique combined 
with the use of microslits an extremely large mul- 
tiplex results and allows high-density mapping of 
fields in chosen spectral lines. For example in the 
last year LDSS-I— I- has been used to map Ha emis- 
sion in the core and outskirts of the z = 0.32 
galaxy cluster AC114 (Couch et al. 2000). The 
TAURUS blocking filter R6 was used which gives 
a bandpass of 400 A for Hq: (and [Nil]) at the clus- 
ter redshift. Using this technique 828 slits were 
placed on galaxies in a 8 arcmin field around the 
cluster. Figure 10 shows a diagram of the spectral 
layout on the detector, it can be seen that despite 
the large number of slits and good 2D coveragre 
of the cluster no overlap occurs. Also shown is 
a zoom are actual sky-subtracted cluster spectra 
where the Ha lines can be seen. 

Another mode which has been developed for 
LDSSH — h takes the multiplex to an extreme limit 
by taking advantage of the superb sky-subtraction 
without a slit. The key idea is to place microslit 
apertures on large numbers of targets (up to sev- 
eral thousand) without regard to spectral overlap, 
and possibly even without a blocking filter. 

Of course the dispersed sky from such a config- 
uration will generate a very complex, overlapping 
pattern. However this can still be removed by the 
nod-shuffle technique, and the residual noise level 
can be easily calculated. Any features left can 
have a measurable significance assigned to them. 

Why would such observing be useful? Well one 
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Fig. 10. Ha spectroscopy of the z = 0.32 cluster AC114. Top: layout of the spectra in the 9 arcmin FOV, 
Bottom left: Zoom showing sky-subracted, dispersed image with a couple of Ha lines visible. Bottom right: 
Two sample extracted spectra showing Ha and [NH]. 
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example project is illustrated in Figure 11. Here 
~ 2000 slits were placed on galaxies selected to 
~ 26 in a 7 arcmin field called the 'Herschel 
Deep Field' (McCrackcn ct al. 2000). The sky is 
removed by nod-shuffle and a noise map is calcu- 
lated. If a galaxy has strong emission lines then 
they peak up above the noise map. 

Essentially we are searching virtually all galax- 
ies in the field for emission — so it is similar to a 
slitless grism survey. However we still have a mask 
in the beam so the level of the sky background is 
enormously reduced (a factor of 50 in this case) 
with corresponding increase in signaknoise. Be- 
cause of the similarity we call this method 'pseudo- 
slitless'. Another way of looking at this is we are 
using our prior knowledge of where galaxies are 
in the broad-band image to exclude unwanted sky 
photons. The background is higher than conven- 
tional spectroscopy, but more objects arc observed 
simultaneously. In principle these effects cancel 
exactly, if there are N times more microslits then 
the average background is N times higher and the 
exposure has to be N times longer for the same sig- 
nahnoise. In practice there are gains in efficiency 
due to factors such as overlap and clustering which 
complicate slit assigment in the normal case. For 
the real example in Figure 11 the factor N ^ 10. 

How does this approach compare against, for 
example, narrow-band imaging and scanning in 
wavelength? In the pseudo-slitless mode we are 
pre-selecting from the broad-band so it is possi- 
ble to miss pure-emission lino objects. If we ig- 
nore this difficult to quantify handicap then there 
is a net gain. Let us assume the tunable-filter in- 
strTiment has the same absolute throughput as the 
spectrograph. The pseudo-slitless approach gives 
a very large wavelength coverage in our exam- 
ple 5300A. At a resolution of 20A then that is 
needs 265 tunable filter settings. In our example 
the pseudo-slitless approach has 10 times higher 
backgroTind — so the gain is a factor of ~ 26, for 
the objects searched. 

Some data was collected in this mode in Au- 
gust 1999. The project is attempting to quan- 
tify the space density of Ha, H/3, [Oil], Lya line 
sources at 2; = 0.2, 0.6, 1.1, 5.6 respectively 
(Glazebrook ct al. 2000b). 

There is of course an inherent ambiguity: if an 
emission line is detected how can we determine 
which microslit it came from? There will be many 



candidates along it's dispersion track. This is re- 
solved in two ways: firstly a minimum separation 
is enforced between slits (e.g. a few arcsec) to 
allow for errors in the traccback. Secondly the 
observations are made for different mask orienta- 
tions on the sky. As the grism is kept fixed we 
get a different set of tracks. For the observations 
here positions of 0° and 180° were used: the emis- 
sion line is dispersed in opposite directions in each 
case and the correct microslit lies halfway between 
them. 

Finally we note that it is possible to arbitrarily 
combine the approaches described here. For ex- 
ample in the pseudo-slitless mode blocking filters 
can also be used: this will limit the spectral cov- 
erage but also reduce the background. There is 
a choice as to whether to go for low or high mi- 
croslit densities — the latter will mean having to 
deal with confusion and a higher background. 

7. Future prospects 

7.1. Nodding with infrared arrays 

7.1.1. Prospects for mimicking shuffling directly 

Can the nod-shuffle concept be extended to in- 
clude IR-sensitive devices? We have been asked 
this question many times — since the OH night 
sky lines account for 98% of the sky background 
in the J and H bands this would give major gains. 
However, infrared arrays are fundamentally dif- 
ferent devices from CCDs. In conventional ar- 
rays, the pixels are not charge- coupled so that 
charge cannot be shifted between pixels (Rieke 
1994, McLean 1997). 

CCDs are monolayer devices where the charge 
is normally shifted row by row into the read-out 
(shift) register. Pixels within the read-out register 
are read out serially towards the output amplifier 
by means of 2, 3 or 4- phase shift electrodes. In 
contrast, the Rockwell hybrid arrays are 2-layer 
devices which use a thin HgCdTe film to collect the 
light, which in turn is connected pixel- by- pixel via 
indium bump bonds to a MOSFET multiplexor. 
Each pixel is addressed in an {x, y) fashion through 
the use of a row and a column shift register at two 
edges of the multiplexor. In the 'source-follower' 
multiplexor design, the bump bond makes con- 
tact with a MOSFET. When IR photons hit the 
light-sensitive layer, the electrons are transferred 
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Fig. 11. — Illustration of a region of data in the 'Pseudo-slitless' mode. The full mask (about 7 arcmin) is 
shown at the top. Slits have been placed on every object with i? < 26 (except near bright foreground stars). 
The slit density is about 50 arcmin^. The lower panels show a region about 1.6 arcmin across zoomed in. 
Left: before sky-subtraction showing the complex overlapping pattern. Right: after sky-subtraction showing 
a noise pattern plus some bright emission lines from a low-redshift galaxy. Continuum from some bright 
objects can also be seen. 
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through the bond to the capacitance-storing MOS- 
FET gate. This gate is bordered by a 'source' 
(grounded) and 'drain'. This circuitry allows for a 
'non-destructive read' (NDR) of the voltage across 
the gate. Another FET is attached to the gate to 
allow every element of the array to be 'reset' in a 
single action. 

We have considered possible modifications to 
the IR array design which would allow for the 
equivalent of CCD-style charge shuffle operations, 
i.e. that contains two or more switchable pockets 
per pixel in which to store charge. Unlike Rockwell 
arrays, there exist multiplexors which use arrays 
of FETs as op-amps which simply transfer photo- 
generated charge to an integrating capacitor (e.g. 
Kozlowski 1996). One could conceive switching 
between a pair, or more, of integrating capacitors 
in which to build up charge sequentially over time. 

However, the more connections you attach to 

the detecting node, the more the capacitance goes 
up, and therefore the read noise. The array mul- 
tiplexor already has a higher circuit density com- 
pared to CCDs and this would increase it further. 
This would be a very difficult technology to de- 
velop. 

7.1.2. Can one use Non-Destructive Reads to fa- 
cilitate beamswitching? 

We have also considered the question of whether 
the non-destructive read mode with ramp sam- 
pling could be used to mimic shuffling, for exam- 
ple by switching between OBJECT and SKY while 
sampling up the slope and solving for OBJECT 
and SKY count rates simultaneously while still 
allowing readnoise reduction (the main point of 
ramp sampling). This is illustrated in Figure 12. 

We have solved analytically the case for double- 
slope least square fitting. If n is the total number 
of reads with error cr we find for large ^ n that the 
error on the OBJECT slope CTo is given by: 

where k is the number of OBJECT-SKY sub- 
intervals (e.g. fc = 6 in Figure 12). If we compare 
this with the classic single least square formula 

'Pull derivation is available on request from the authors 



((To = \/l2cr^/n^At^), we derive the ratio: 

o-Q (double) ^ 
iTo (single) 

The factor of 2 is the usual beamswitching fac- 
tor encountered in Section 4. We see the effect of 
beamswitching is to increase the noise in propor- 
tion to the number of switches, this is because the 
switching reduces the baseline for slope fitting. It 
turns out for reasonable values of n and k this is 
not a useful technique. For example suppose the 
array can be read out every second during a 1800 
sec exposure. Single least-squares would give a 
noise reduction of ^ 12 x, if wo then beamswitch 
every 30s this becomes a noise increase of ^ 4.9 x . 

Finally we note from Section 5.4 that in any 
case the assumption that the source is of constant 
brightness and that counts oc time is very dubi- 
ous for the sky. The airglow is a stochastic phe- 
nomenon with a lot of variation and will deviate 
from a linear growth. This will generate artificial 
noise in a line-fitting approach, even with the clas- 
sical single-line fit. NDR slope-fitting has become 
a standard technique at many observatories, but 
the effects of sky-background variations on noise 
have not been studied. 
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Fig. 12.— Illustration of the 'IR nod-NDR con- 
cept'. As counts are accumulated in the NDR 
mode the telescope is switched between OBJECT 
and SKY periodically. A double-slope least- 
squares fit is performed to derive the OBJECT 
and SKY count rates. It turns out that this is not 
useful (see text). 
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7.1.3. Physical array shifting 

The most reasonable option for mimicking 
something Hke charge shuffling is to form two 
adjacent images at the detector either by nod- 
ding the collimator or by a physical movement of 
the array. The present IR arrays are 1024x1024 
pixels in size, although Rockwell are expected to 
produce 2048x2048 formats in the near future. 
'Detector nodding' is much the preferred option 
for a number of reasons. First, a nodding collima- 
tor leads to different light paths for the object and 
sky positions. Secondly, in infrared instrumenta- 
tion, the collimator must image the pupil onto the 
cold stop with care. Thirdly, the physical toler- 
ances at the collimator are made much tighter by 
any focal reduction compared to the tolerances of 
detector movement. Finally, the array has much 
the lowest mass of any component of the system, 
and a 1 Hz movement through a few millimeters 
is not an excessive strain on the electrical bonds. 

An advantage of IR 'shuffling' over optical shuf- 
fling is that the stored charge is not subject to 
trapping sites. Furthermore, the detector needs 
only to be partitioned into two panels rather than 
the three panels of optical CCDs. A distinct disad- 
vantage is that in IR shuffling the Hatfield struc- 
ture will be different in OBJECT and SKY re- 
gions. However this effect can be averaged out by 
swapping the OBJECT and SKY positions on the 
array between successive exposures. 

For a detector with 18/im pixels, the physical 
movement of the array should be accurate to bet- 
ter than 2% of a resolution element (assumed to 
be 3 pixels). Precision movement to this level is 
routinely achieved in, say, a mechanism for opti- 
cal focussing. But within a cryogenic environment, 
1/Lim accuracy presents a moderate challenge. This 
seems feasible with either a linear variable dif- 
ferential transformer (LVDT) or a linear encoder. 
Piezo-electric control at cryogenic temperatures is 
a more difflcult prospect. We note that a well sam- 
pled resolution clement (say 5 pixels width) may 
in fact allow wavelength calibration to sufficiently 
high accuracy between the object and sky expo- 
sures that the precision can be relaxed by post- 
analysis. However, data analysis is greatly sim- 
plified by the ability to remove sky accurately by 
straight subtraction since no interpolation is re- 
quired. 



7.2. Applications to non-contiguous spec- 
troscopy 

The nod-shuffle technique allows accurate sky- 
subtraction without requiring sky spectra which 
arc spatially contiguous on the detector and the 
sky. Thus it is particularly suitable for non- 
contiguous optical systems such as fiber spec- 
trographs and integral field unit spectrographs 
(IFUs), both fiber based and non- fiber based. 

Application of fibers to faint spectroscopy have 
been limited by sky-subtraction accuracies of typi- 
cally 3% (Wyse & Gilmore 1992), which are due to 
variable fiber transmission. The nod-shuffle tech- 
nique can be applied to fiber spectrographs provid- 
ing there is spare room on the detector as outlined 
earlier, the 2D shuffled subframe of SKY spectra 
through the fibers is simply subtracted from the 
2D OBJECT subframe. 

Due to the quasi-simultaneity the effect of vary- 
ing fiber throughput, which varies on a much long 
timescale (hours), will cancel out as the sky is ob- 
served through exactly the same fibers. At the 
AAT we have already experimented with nod- 
shuffle using the Two Degree Field fiber spectro- 
graph and have obtained shot noise limited sub- 
traction implying systematics << 1% (Glazebrook 
et al. 1999). 

The application to IFU's is also straight- 
forward. Accurate sky-removal is achieved by 
subtracting the shuffled frames before individual 
IFU element spectra are extracted and assembled 
to make a data cube. Just like the slit case the 
object could be nodded between two positions on 
the IFU, or the nod throw could be large enough 
to move the whole IFU to clear sky. While the ef- 
fect of calibration of elements on sky-subtraction 
is eliminated, it must still be solved if accurate 
spectro-photometry is desired. 

7.3. Ultra-Deep spectroscopic exposures 

The promise of nod-shuffling is of course that 
the extreme precision of sky cancellation will allow 
very very long deep spectroscopic exposures. It is 
interesting to compare ground-based spectroscopy 
with space astronomy (X-ray, IR, etc.). In the 
latter it is common to see total exposures of many 
days to weeks in total, whereas in the former it is 
rare to see total exposures of longer than a night's 
observing. 
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Why is this? The answer is because of the high 
sky background then one reaches a Umit in only a 
few hours observing where one is dominated by the 
systcmatics of how well one can remove it. This 
is doubly important because the sky spectrum ex- 
hibits extraordinarily complex structure. Also as 
we have seen in Section 5.1 there are a large num- 
ber of seperate instrumental effects which all op- 
erate at the 0.5 — 1% level. 

The beauty of the nod-shuffle technique is that 
it is a perfectly differential experiment and all of 
these effects are removed simultaneously from the 
sky-subtraction process. They still affect the ob- 
ject spectrum, but that is far less important com- 
pared to the random noise. 

The question arises then will the nod-shuffle 
technique permit the use of ultra-deep exposures, 
lasting 10^ — 10^ sees, for optical spectroscopy? We 
believe it can. At the level of sky-subtraction pre- 
cision demonstrated we estimate that one could 
obtain a good spectrum of a Iab = 27.2 galaxy 
(i.e. 3(7 above the sky limit). At a resolution of 
R ~ 800 one could reach this in a 200,000 sec ex- 
posure (7 nights) on a 10m telescope with a 35% 
efficient spectrograph. Using microslits one could 
squeeze many parallel targets into even a small 
field. 

We re-emphasize that we believe our current 

sky-subtraction accuracy is only an upper limit to 
what can be achieved. The sky-background can be 
reduced further by observing at higher-resolution 
so the OH lines do not dominate the spectrum. 
The intra-OH continuum variations may be far less 
rapid so even greater accuracy could be obtained.^ 
Assuming we could reach 10^** of the sky at a res- 
olution R = 5000, then faintest object would be 
Iab = 29. This could be reached in a 10^ second 
exposure (3 years!) or a more reasonable 10^ ex- 
posure if the spectrum was post-hoc rebinned to 
R = 500. 

30-50m telescopes are being planned at the 
time of writing, these would reach the same lim- 
its an order of magnitude faster. We emphasize 

'''On the basis of laboratory experiments (Abrams ct al. 
1994), there may exist fainter rotational- vibrational band- 
heads in between the bright OH bands, in which case the 
intra-OH 'continuum' varies on the same timescales as the 
rest of the OH emission. However, the actual contribution 
of these putative features to the intra-OH background light 
remains highly uncertain. 



that without nod-shuffle, or equivalent, techniques 
these telescopes would reach the systematic limit 
for spectroscopy in a mere one hour exposure! 

8. Summary and Conclusions 

We have explained the virtues of the nod-shuffle 
technique for CCD-based optical spectroscopy: we 
reach a new level of sky-subtraction precision of 
0.04%. This is in accord which predictions from a 
reasonable physical model of atmospheric air glow. 

This technique also permits a great increase in 
the multiplex gain of multi-slit spectrographs we 
have quantified those gains and showed that they 
are the most in high-object density regimes. 

Wc have outlined our thoughts on IR techniques 
equivalent to nod-shuffle. Possibly new circuit de- 
signs would allow charge storage but would need to 
be developed. Given the importance of IR spec- 
troscopy on future large telescopes the scientific 
case for doing so is strong. Failing this we have 
outlined a less satisfactory, but still useful con- 
cept, for physically moving the IR array. 

For very large telescopes (10m and greater) the 
precision of sky-subtraction is a real barrier for 
ultra-deep spectroscopic exposures. The system- 
atic limit of ordinary slit subtraction is reached in 
only a few hours. The nod-shuffle technique of- 
fers a remedy, and promises the possibility of ex- 
tremely long exposures, it's ultimate performance 
remains to be explored. 
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